VAST Challenge MC2 - Part 2

This post presents the step by step instructions that were followed to identify which GASTech employess made which purchases and identify suspicious patterns of behavior using visual analytic techniques.

Mayurapriyann Arulmozhi Baskaran https://www.linkedin.com/in/mayurapriyann/
07-25-2021
packages = c('raster','sf','tmap','tidyverse','clock','rgdal','tidytext','widyr',
             'DT','dplyr','hms','ggraph','igraph','crosstalk',
             'plotly','data.table','stringi','mapview','ggridges','networkD3',
             'htmlwidgets')

for(p in packages){
  if(!require(p, character.only = T)){
    install.packages(p)
  }
  library(p, character.only = T)
}

Loading data

cc <- read_csv(file = 'data/cc_data.csv')
lc <- read_csv(file = 'data/loyalty_data.csv')
bgmap <- raster("data/MC2-tourist.tif")
Abila_st <- st_read(dsn = "data/Geospatial",
                    layer = "Abila")
Reading layer `Abila' from data source 
  `C:\abmayur05\VisualAnalytics\_posts\2021-07-26-vast-challenge-mc2-part-2\data\Geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 3290 features and 9 fields
Geometry type: LINESTRING
Dimension:     XY
Bounding box:  xmin: 24.82401 ymin: 36.04502 xmax: 24.90997 ymax: 36.09492
Geodetic CRS:  WGS 84
gps <- read_csv("data/gps.csv")
net <- read_csv(file = 'data/car-assignments.csv')

Task 2

Add the vehicle data to your analysis of the credit and loyalty card data. How does your assessment of the anomalies in question 1 change based on this new data? What discrepancies between vehicle, credit, and loyalty card data do you find? Please limit your answer to 8 images and 500 words.

Plotting Raster Layer

tm_shape(bgmap) +
  tm_rgb(bgmap, r = 1, g = 2, b = 3,
         alpha = NA,
         saturation = 1,
         interpolate = TRUE,
         max.value = 255)

The map of Abila gives us a clear picture of the distance that GASTech is away from the other locations visited by the employees. From the map of Abila, the most frequented locations such as Katerina’s Cafe, Guy’s Gyros, Brew’ve Been Served were all seems to be in close proximity with GASTech. Hippokampos was the second most frequented location but surprisingly it was not present in the map. Since the top 4 frequented locations are closely located, I assume Hippokampos also should be near to GASTech.

Converting Date-Time field and ID field

gps$Timestamp <- date_time_parse(gps$Timestamp,
                                 zone = "",
                                 format = "%m/%d/%Y %H:%M:%S")
gps$id <- as_factor(gps$id)

Converting Aspatial Data Into A Simple Feature data Frame

gps_sf <- st_as_sf(gps,
                   coords = c("long","lat"),
                   crs = 4326)
gps_sf <- gps_sf %>%
  mutate(day = get_day(Timestamp),
         hour = get_hour(Timestamp),
         dayofweek = date_weekday_factor(Timestamp) ,
         date = as_date(Timestamp))
glimpse(gps_sf)
Rows: 685,169
Columns: 7
$ Timestamp <dttm> 2014-01-06 06:28:01, 2014-01-06 06:28:01, 2014-01~
$ id        <fct> 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35~
$ geometry  <POINT [°]> POINT (24.87469 36.07623), POINT (24.8746 36~
$ day       <int> 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6,~
$ hour      <int> 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6,~
$ dayofweek <ord> Mon, Mon, Mon, Mon, Mon, Mon, Mon, Mon, Mon, Mon, ~
$ date      <date> 2014-01-06, 2014-01-06, 2014-01-06, 2014-01-06, 2~

Creating Movement Path From GPS Points

The code chunk below joins the GPS points into movement paths by using car id, date and hour as unique identifiers.

gps_path <- gps_sf %>%
  group_by(id, date, hour) %>%
  summarize(timestamp= mean(Timestamp),
            do_union=FALSE) %>%
  st_cast("LINESTRING")

Finding the orphan lines

After getting the movement paths we do see some orphan GPS points hence finding those orphan lines.

p = npts(gps_path, by_feature = TRUE)
gps_path2 <- cbind(gps_path, p)

Removing the orphan lines

gps_path3 <- gps_path2[!(gps_path2$p==1),]

Plotting the GPS Paths

Below code chunk is used to overplot the GPS path of all the car ids onto the background tourist map.

gps_path_selected <- gps_path3
tmap_mode("view")
tm_shape(bgmap) +
  tm_rgb(bgmap, r = 1, g = 2, b = 3,
         alpha = NA,
         saturation = 1,
         interpolate = TRUE,
         max.value = 255) +
  tm_shape(gps_path_selected) +
  tm_lines(col='red')

Previously using the heat map plot on location vs hour of transaction shows that there are some transactions recorded at early morning around 3 am - 4 am at Kronos Mart. Hence further plotting the GPS path during the time period to see if any car id had passed by that way.